230 research outputs found
Adversarial Variational Optimization of Non-Differentiable Simulators
Complex computer simulators are increasingly used across fields of science as
generative models tying parameters of an underlying theory to experimental
observations. Inference in this setup is often difficult, as simulators rarely
admit a tractable density or likelihood function. We introduce Adversarial
Variational Optimization (AVO), a likelihood-free inference algorithm for
fitting a non-differentiable generative model incorporating ideas from
generative adversarial networks, variational optimization and empirical Bayes.
We adapt the training procedure of generative adversarial networks by replacing
the differentiable generative network with a domain-specific simulator. We
solve the resulting non-differentiable minimax problem by minimizing
variational upper bounds of the two adversarial objectives. Effectively, the
procedure results in learning a proposal distribution over simulator
parameters, such that the JS divergence between the marginal distribution of
the synthetic data and the empirical distribution of observed data is
minimized. We evaluate and compare the method with simulators producing both
discrete and continuous data.Comment: v4: Final version published at AISTATS 2019; v5: Fixed typo in Eqn 1
Understanding Random Forests: From Theory to Practice
Data analysis and machine learning have become an integrative part of the
modern scientific methodology, offering automated procedures for the prediction
of a phenomenon based on past observations, unraveling underlying patterns in
data and providing insights about the problem. Yet, caution should avoid using
machine learning as a black-box tool, but rather consider it as a methodology,
with a rational thought process that is entirely dependent on the problem under
study. In particular, the use of algorithms should ideally require a reasonable
understanding of their mechanisms, properties and limitations, in order to
better apprehend and interpret their results.
Accordingly, the goal of this thesis is to provide an in-depth analysis of
random forests, consistently calling into question each and every part of the
algorithm, in order to shed new light on its learning capabilities, inner
workings and interpretability. The first part of this work studies the
induction of decision trees and the construction of ensembles of randomized
trees, motivating their design and purpose whenever possible. Our contributions
follow with an original complexity analysis of random forests, showing their
good computational performance and scalability, along with an in-depth
discussion of their implementation details, as contributed within Scikit-Learn.
In the second part of this work, we analyse and discuss the interpretability
of random forests in the eyes of variable importance measures. The core of our
contributions rests in the theoretical characterization of the Mean Decrease of
Impurity variable importance measure, from which we prove and derive some of
its properties in the case of multiway totally randomized trees and in
asymptotic conditions. In consequence of this work, our analysis demonstrates
that variable importances [...].Comment: PhD thesis. Source code available at
https://github.com/glouppe/phd-thesi
Approximating Likelihood Ratios with Calibrated Discriminative Classifiers
In many fields of science, generalized likelihood ratio tests are established
tools for statistical inference. At the same time, it has become increasingly
common that a simulator (or generative model) is used to describe complex
processes that tie parameters of an underlying theory and measurement
apparatus to high-dimensional observations .
However, simulator often do not provide a way to evaluate the likelihood
function for a given observation , which motivates a new class of
likelihood-free inference algorithms. In this paper, we show that likelihood
ratios are invariant under a specific class of dimensionality reduction maps
. As a direct consequence, we show that
discriminative classifiers can be used to approximate the generalized
likelihood ratio statistic when only a generative model for the data is
available. This leads to a new machine learning-based approach to
likelihood-free inference that is complementary to Approximate Bayesian
Computation, and which does not require a prior on the model parameters.
Experimental results on artificial problems with known exact likelihoods
illustrate the potential of the proposed method.Comment: 35 pages, 5 figure
Visualization of Publication Impact
Measuring scholarly impact has been a topic of much interest in recent years.
While many use the citation count as a primary indicator of a publications
impact, the quality and impact of those citations will vary. Additionally, it
is often difficult to see where a paper sits among other papers in the same
research area. Questions we wished to answer through this visualization were:
is a publication cited less than publications in the field?; is a publication
cited by high or low impact publications?; and can we visually compare the
impact of publications across a result set? In this work we address the above
questions through a new visualization of publication impact. Our technique has
been applied to the visualization of citation information in INSPIREHEP
(http://www.inspirehep.net), the largest high energy physics publication
repository
Mining gold from implicit models to improve likelihood-free inference
Simulators often provide the best description of real-world phenomena.
However, they also lead to challenging inverse problems because the density
they implicitly define is often intractable. We present a new suite of
simulation-based inference techniques that go beyond the traditional
Approximate Bayesian Computation approach, which struggles in a
high-dimensional setting, and extend methods that use surrogate models based on
neural networks. We show that additional information, such as the joint
likelihood ratio and the joint score, can often be extracted from simulators
and used to augment the training data for these surrogate models. Finally, we
demonstrate that these new techniques are more sample efficient and provide
higher-fidelity inference than traditional methods.Comment: Code available at
https://github.com/johannbrehmer/simulator-mining-example . v2: Fixed typos.
v3: Expanded discussion, added Lotka-Volterra example. v4: Improved clarit
Diffusion Priors In Variational Autoencoders
Among likelihood-based approaches for deep generative modelling, variational
autoencoders (VAEs) offer scalable amortized posterior inference and fast
sampling. However, VAEs are also more and more outperformed by competing models
such as normalizing flows (NFs), deep-energy models, or the new denoising
diffusion probabilistic models (DDPMs). In this preliminary work, we improve
VAEs by demonstrating how DDPMs can be used for modelling the prior
distribution of the latent variables. The diffusion prior model improves upon
Gaussian priors of classical VAEs and is competitive with NF-based priors.
Finally, we hypothesize that hierarchical VAEs could similarly benefit from the
enhanced capacity of diffusion priors
Gradient Energy Matching for Distributed Asynchronous Gradient Descent
Distributed asynchronous SGD has become widely used for deep learning in
large-scale systems, but remains notorious for its instability when increasing
the number of workers. In this work, we study the dynamics of distributed
asynchronous SGD under the lens of Lagrangian mechanics. Using this
description, we introduce the concept of energy to describe the optimization
process and derive a sufficient condition ensuring its stability as long as the
collective energy induced by the active workers remains below the energy of a
target synchronous process. Making use of this criterion, we derive a stable
distributed asynchronous optimization procedure, GEM, that estimates and
maintains the energy of the asynchronous system below or equal to the energy of
sequential SGD with momentum. Experimental results highlight the stability and
speedup of GEM compared to existing schemes, even when scaling to one hundred
asynchronous workers. Results also indicate better generalization compared to
the targeted SGD with momentum
You say Normalizing Flows I see Bayesian Networks
Normalizing flows have emerged as an important family of deep neural networks
for modelling complex probability distributions. In this note, we revisit their
coupling and autoregressive transformation layers as probabilistic graphical
models and show that they reduce to Bayesian networks with a pre-defined
topology and a learnable density at each node. From this new perspective, we
provide three results. First, we show that stacking multiple transformations in
a normalizing flow relaxes independence assumptions and entangles the model
distribution. Second, we show that a fundamental leap of capacity emerges when
the depth of affine flows exceeds 3 transformation layers. Third, we prove the
non-universality of the affine normalizing flow, regardless of its depth
- …